Risk-Aware Multi-Armed Bandits With Refined Upper Confidence Bounds

نویسندگان

چکیده

The classical multi-armed bandit (MAB) framework studies the exploration-exploitation dilemma of decisionmaking problem and always treats arm with highest expected reward as optimal choice. However, in some applications, an a high can be risky to play if variance is high. Hence, variation should considered make arm-selection process risk-aware. In this letter, mean-variance metric investigated measure uncertainty received rewards. We first study risk-aware MAB when follows Gaussian distribution, concentration inequality on developed design risk aware-upper confidence bound algorithm. Furthermore, we extend algorithm novel asymptotic by developing upper based distribution sample variance. Theoretical analysis proves that both proposed algorithms achieve O(log(T)) regret. Finally, numerical results demonstrate our outperform several algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MaRaB) algorithm. With the goal of limiting the exploration of risky arms, MaRaB takes as arm quality its conditional value at risk. When the usersupplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MaRaB tends toward the MIN mu...

متن کامل

Tighter Bounds for Multi-Armed Bandits with Expert Advice

Bandit problems are a classic way of formulating exploration versus exploitation tradeoffs. Auer et al. [ACBFS02] introduced the EXP4 algorithm, which explicitly decouples the set of A actions which can be taken in the world from the set of M experts (general strategies for selecting actions) with which we wish to be competitive. Auer et al. show that EXP4 has expected cumulative regret bounded...

متن کامل

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of stochastic multiarmed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existen...

متن کامل

Risk-Aversion in Multi-armed Bandits

Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off...

متن کامل

Anytime Exploration for Multi-armed Bandits using Confidence Information

We introduce anytime Explore-m, a pure exploration problem for multi-armed bandits (MAB) that requires making a prediction of the top-m arms at every time step. Anytime Explore-m is more practical than fixed budget or fixed confidence formulations of the top-m problem, since many applications involve a finite, but unpredictable, budget. However, the development and analysis of anytime algorithm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Signal Processing Letters

سال: 2021

ISSN: ['1558-2361', '1070-9908']

DOI: https://doi.org/10.1109/lsp.2020.3047725